Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection

نویسندگان

  • Yufei Han
  • Yun Shen
چکیده

Selecting discriminative features in positive unlabelled (PU) learning tasks is a challenging problem due to lack of negative class information. Traditional supervised and semi-supervised feature selection methods are not able to be applied directly in this scenario, and unsupervised feature selection algorithms are designed to handle unlabelled data while neglecting the available information from positive class. To leverage the partially observed positive class information, we propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training instances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model. Extensive experiments on different benchmark databases and a real-world cyber security application demonstrate the effectiveness of our algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data

The importance of Markov blanket discovery algorithms is twofold: as the main building block in constraint-based structure learning of Bayesian network algorithms and as a technique to derive the optimal set of features in filter feature selection approaches. Equally, learning from partially labelled data is a crucial and demanding area of machine learning, and extending techniques from fully t...

متن کامل

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...

متن کامل

BASSUM: A Bayesian semi-supervised method for classification feature selection

Feature selection is an important preprocessing step for building efficient, generalizable and interpretable classifiers on high dimensional data sets. Given the assumption on the sufficient labelled samples, the Markov Blanket provides a complete and sound solution to the selection of optimal features, by exploring the conditional independence relationships among the features. In real-world ap...

متن کامل

Feature selection for semi-supervised data analysis in decisional information systems. (Sélection de variables pour l'analyse semi-supervisées des données dans les systèmes d'Information décisionnels)

Feature selection is an important task in data mining and machine learning processes. This task is well known in both supervised and unsupervised contexts. The semi-supervised feature selection is still under development and far from being mature. In general, machine learning has been well developed in order to deal with partially-labeled data. Thus, feature selection has obtained special impor...

متن کامل

Dimensionality Reduction of Hyperspectral Images by Combination of Non-parametric Weighted Feature Extraction (nwfe) and Modified Neighborhood Preserving Embedding (npe)

This paper combine two conventional feature extraction methods (NWFE&NPE) in a novel framework and present a new semisupervised feature extraction method called Adjusted Semi supervised Discriminant Analysis (ASEDA). The advantage of this method is dominating the Hughes phenomena, automatic selection of unlabelled pixels, extraction of more than L-1(L: number of classes) features and avoidance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016